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This article extends the scope of empirical likelihood method- 
ology in three directions: to allow for plug-in estimates of nuisance 
parameters in estimating equations, slower than yTI-rates of conver- 
gence, and settings in which there are a relatively large number of 
estimating equations compared to the sample size. Calibrating em- 
pirical likelihood confidence regions with plug-in is sometimes in- 
tractable due to the complexity of the asymptotics, so we introduce 
a bootstrap approximation that can be used in such situations. We 
provide a range of examples from survival analysis and nonparametric 
statistics to illustrate the main results. 

1. Introduction. Empirical likelihood [Owen (1990, 2001)] has tradition- 
ally been used for providing confidence regions for multivariate means and, 
more generally, for parameters in estimating equations, under various stan- 
dard assumptions: the number of estimating equations is fixed, they do not 
involve nuisance parameters, and the parameters of interest are estimable 
at -y/n-rate, where n is the sample size. Under such assumptions and with 
i.i.d. observations [or even dependent observations; see, e.g., Chapter 8 of 
Owen (2001)], empirical likelihood (EL) based confidence regions can be 
calibrated using a nonparametric version of Wilks's theorem involving a 
chi-squared limiting distribution. 

The aim of the present paper is to develop adaptations when the tradi- 
tional assumptions are violated. More specifically, under certain asymptotic 
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stability conditions, we establish generalizations of the basic theorem of EL 
to allow for plug-in estimates of nuisance parameters in the estimating equa- 
tions, for slower than -^/re-rates of convergence, and for i.i.d. settings in which 
there are a relatively large number of estimating equations compared to the 
sample size. Several of our examples share the characteristic that they would 
be harder to analyze with other methods. In particular, the method of profile 
EL [see, e.g., Owen (2001), page 42] for dealing with nuisance parameters in 
estimating equations is often not applicable for infinite-dimensional nuisance 
parameters, and even when it is applicable, implementation can be compu- 
tationally difficult. The triangular array EL theorem of Owen [(2001), page 
85] applies under slower than y'n-rates, and has been useful in the context 
of nonparametric density estimation, for instance, but is not flexible enough 
to handle estimating functions with plug-in. 

The use of plug-in for nuisance parameters in EL confidence regions is not 
new. It has recently been applied in various survival analysis contexts; see 
Qin and Jing (2001a, 2001b), Wang and Jing (2001), Li and Wang (2003) 
and Qin and Tsao (2003). The technique has also been used in survey sam- 
pling with imputation for missing response; see Wang and Rao (2002). Our 
aim here, however, is to provide a more widely applicable version of this 
approach, that can accommodate a wide array of examples, allowing both 
plug-in and slower than -^/n-rates of convergence. We take the point of view 
that it is preferable to derive a general result using generic assumptions, that 
can be checked in a large number of applications, rather than reinventing 
the basic theory on each occasion. Calibrating EL confidence regions with 
plug-in is sometimes intractable due to the complexity of the asymptotics, so 
we introduce a bootstrap approximation that can be used in such situations. 

To illustrate our general results we consider a range of examples from 
survival analysis and nonparametric statistics in settings where the infer- 
ence is based on estimating functions. In particular, we look at function- 
als of survival distributions with right censored data [treated via EL in 
Wang and Jing (2001)], the error distribution in nonparametric regression 
[Akritas and Van Keilegom (2001)], density estimation [treated by EL in 
Hall and Owen (1993) and Chen (1996)], and survival function estimation 
from current status data [van der Vaart and van der Laan (2006)]. 

Standard maximum likelihood theory for parametric models, as well as 
EL theory, keeps the dimension of the parameter (or the number of estimat- 
ing equations) fixed, say at p, as sample size n grows. This is what leads 
to asymptotic normality, Wilks type theorems for likelihood ratio statistics 
and Owen type theorems for EL. Portnoy (1986, 1988) and others have in- 
vestigated the extent to which maximum likelihood theory based results still 
hold, when p is allowed to increase with n. The canonical growth restric- 
tion for normal approximations to hold is that p 2 /n — > 0, while p 3//2 /n — > 
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typically suffices for certain quadratic approximations associated with Wilks 
theorems to hold. 

In this article we investigate the similar problem of finding conditions 
under which the EL methods continue to work adequately when p grows. The 
canonical growth condition will be seen to be p 3 /n — ► 0. Under this condition, 
in addition to other requirements that have to do with stability of eigenvalues 
of covariance matrices, minus twice the log-EL can be approximated well 
enough with a certain quadratic form that in itself is close to a Xp- 

We should add that in situations with a high number of parameters the 
typical aim is not to provide a simultaneous confidence region for the full pa- 
rameter vector, say (/xi, . . . ,//»)• It could rather be to test whether a subset 
of the parameters have zero values, or to compare one distribution with an- 
other, or, more generally, to make inference for a focus parameter of dimen- 
sion q <p, say /(/ii, . . . , jJ, p ). For any linear map /, these tasks can be carried 
out inside our framework for growing p by constructing a ^-dimensional con- 
fidence region in which q grows with n. For further discussion in the context 
of a regression example, see Section 5.4. 

The paper is organized as follows. Section 2 develops the EL theory with 
plug-in and the bootstrap approximation of the limiting distribution of the 
EL statistic. Six examples, including two involving slower than y^-rates of 
convergence, are discussed in Section 3. In Section 4 we examine the limiting 
behavior of the EL statistic in situations where the number of estimating 
functions is allowed to increase with growing sample size. Some examples 
are presented in Section 5, including setups with "growing polynomial re- 
gression" and "growing exponential families." Proofs can be found in the 
Appendix. 

2. Plug-in empirical likelihood. We first describe the general framework. 
The basic idea of empirical likelihood (EL) is to regard the observations 
X\, . . . ,X n as if they are i.i.d. from a fixed and unknown d-dimensional 
distribution P, and to model P by a multinomial distribution concentrated 
on the observations. Inference for the parameter(s) of interest, 0q = &{P) £ 
G , is then carried out using a p-dimensional estimating function of the form 
m n {X,8,h), where, for the purposes of the present paper, h is a (possibly 
infinite-dimensional) "nuisance" parameter with unknown true value ho = 
h(P)€H. 

When ho is known, it can replace h in the EL ratio function 

{n n n ~\ 

^J(mui) : each W{ > 0, W\ = 1, /J Wiin n (Xi,9, h) = >, 
i=l i=l i=l J 

leading to a confidence region {6 : EL n (9, ho) > c} for 6q, where c is a suit- 
able positive constant, and the maximum of the empty set is defined to be 
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zero. The constant c can be calibrated using Owen's (1990) EL theorem, 
provided m n = m does not depend on n: if the observations are i.i.d. and 
m(X,0Q,ho) has zero mean and a positive definite covariance matrix, then 
— 21ogEL n (6>o, ho) — >d Xpi where Xp has a chi-squared distribution with p 
degrees of freedom. 

2.1. Main result. We now establish a generalization of Owen's result in 
which the unknown ho is replaced by an estimator h, and the estimating 
function is allowed to depend on n. This result will provide a way of cali- 
brating {9 : EL n (6, h) > c} as a confidence region for 6q. We extract the basic 
structure of Owen's result, and only impose an existence condition, (AO) be- 
low, and some "generic" asymptotic stability conditions, (Al)-(A3) below. 
These conditions ensure a nondegenerate limiting distribution, but do not 
require i.i.d. observations or consistency of h, although such structure may 
very well be helpful for checking the conditions in specific applications. Our 
proof (placed in the Appendix) uses tools somewhat different from those 
usually employed in the EL literature, as in, for example, Owen (2001), 
Chapter 11; see also Remark 2.7 below. 

We use the following notation throughout. For vectors v, let ||u|| de- 
note the Euclidean norm, and t>® 2 = vv^ . For matrices V = {vij), let |V| = 
rricLx^j l^ijj | • 

Let {a n } be a sequence of positive constants bounded away from zero, and 
U a nondegenerate p-dimensional random vector. In most of the applications 
we consider, a n = 1 and U ~ N p (0, Vi), where the covariance matrix V\ is 
positive definite, but the extra generality can be useful in some applications. 
Let V2 denote a p x p positive definite covariance matrix. The following 
conditions are needed: 

(AO) P{EL n (6 ,h) = 0}^0. 
(Al) YA=im n (Xi,e ,h)^ d U. 
(A2) a n J2? =1 m® 2 (X i ,6 ,h)^ w V 2 . 
(A3) a n maxi<i< n 1 1 m n (Xi, 9 , h) \\ -^ pT 0. 

As pointed out by a referee, h just plays the role of indicating that m n is 
being estimated, and we could replace m n (X,6,h) by the simpler notation 
m n (X,0). This also covers situations in which h depends on 6 with an esti- 
mating function of the form m n (X, 9, he). We prefer to include h explicitly in 
the notation, however, because all our examples involve a plug-in estimator, 
as does our bootstrap result in Section 2.3. 

Condition (AO) is equivalent to P(0 G C n ) — > 1, where C n denotes the 
interior of the convex hull of {m n (Xi, 9o, h), i = 1, . . . ,n} and is the zero 
vector in MP. This is the basic existence condition needed for EL to be 
useful in our general setting. Below we describe how the EL statistic can be 
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expressed, up to a negligible remainder term, as a quadratic form involving 
the left-hand sides of (Al) and (A2), so these conditions play a natural role 
in the asymptotics (see Remark 2.7). Finally, (A3) is required to obtain the 
negligibility of the remainder term. For the practical verification of these 
conditions, we refer the reader to Section 3, where they are checked in detail 
in a number of applications. 

Theorem 2.1. // (A0)-(A3) hold, then -2a' 1 logEL n (0 o , h) -> d iPV^U. 

2.2. Remarks. This theorem is related to many results in the literature, 
which we now discuss, along with a sketch of its proof; the complete proof 
appears in the Appendix. 

Remark 2.1. Owen's EL theorem follows from Theorem 2.1 by taking 
a n = 1 and m n = m/ ' \Jn. Indeed, (AO) then holds using an argument involv- 
ing the Glivenko-Cantelli theorem over half-spaces [see page 219 of Owen 
(2001)], (Al) by the multivariate central limit theorem, (A2) by the law 
of large numbers, and (A3) by a Borel-Cantelli argument [Lemma 11.2 of 
Owen (2001)]. 

Remark 2.2. When {7~N P (0, Vi) with V\ positive definite, the limit 
distribution above may be expressed as r\x\ t \ + ■ ■ ■ + r pXi jP > where the Xij' s 
are independent chi-squared random variables with one degree of freedom 
and the weights r\,...,r p are the eigenvalues of V% V\\ cf. Lemma 3 of 
Qin and Jing (2001a). If, in addition, V\ and V2 coincide, we have the stan- 
dard Xp limit distribution. When V\ and V2 are not identical, the weights 

r±, . . . ,r p may need to be estimated, for example via consistent estimators V\, 
V2 and computing the eigenvalues of V\. It is not possible to say anything 
in general about estimation of V\ , which will depend on the structure of the 
specific application; later in this section we examine a bootstrap approach 
which can be applied when V\ is difficult to estimate by other means. For 
a n = 1 , an estimator of V2 is easily provided given plug-in of a consistent es- 
timator 6 for 8q. In the Appendix we show that V2 = Yn=i m n 2 (^j>^j ^) con " 
sistently estimates V2 under the following two additional conditions: there 
exists apx p-matrix- valued function V(0,h) such that 

(A4) For some subset 7i of TC such that P{h G 7i} — > 1, and for some 
5>0, 

sup 

\\e-e \\<s,h£H 

(A5) sup|| e/ _ 0o ||< 5n ^ \V(9, h) - V(6 , h)\ for any real sequence 5 n [ 

0. 



J2mf(Xi,e,h)-v(e,h) 



i=l 



'pr 



0: 
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When the observations are i.i.d. and m n = m/y/n for some function 
m(X,9,h) that does not depend on n, we would expect to use V(9,h) = 
Em® 2 (Xx, 9, h) and then (A4) amounts to a (convergence-in-probability) 
version of the Glivenko-Canteili property for T = {m® 2 {-,9,h) : \\9 — 9q\\ < 
6,heH}. 

Remark 2.3. For i.i.d. observations and m n = m/y/n, with m(X, 9q, ho) 
having zero mean and a finite covariance matrix Vq, the multivariate cen- 
tral limit theorem implies that J2t=i m n{Xi, 9q, ho) tends to N p (0, Vb), so 
condition (Al) describes the perturbation of Vb due to replacing ho by h. 
In the "highly smooth" case that M(9 ,h) = o pi (n~ 1/2 ), where M(9,h) = 
Em(X, 9, h), it can be shown (under some additional assumptions) that there 
is no perturbation: V\ = Vq. For instance, suppose that the class of func- 
tions {m(-,9o,h) :h G Ti.} is Donsker, and h is consistent in the sense that 
pj(h, ho) — > pr for j = 1, . . . ,p, where Pj(h, ho) is the L 2 (P) distance between 
nij{X,9o,h) and nij(X,9o,ho)- Then 

n n 

Y / m n (X u 9 o ,h) = n~ 1 / 2 Y / {H^0oM - M(9 ,h)} + y/^M(6 ,h) 

i=l i=l 

tends to N p (0, Vb), so V\ = Vb, where empirical process theory is used to 
obtain weak convergence of the first term; cf. van der Vaart (1998), page 280. 
However, M(9q,K) = o pr (n~ 1 / 2 ) is a strong condition, so we have avoided 
using it in favor of the less restrictive condition (Al), which is flexible enough 
to be checked within the context of the examples considered in the next 
section. 

Remark 2.4. Kitamura (1997) introduces blockwise EL with estimat- 
ing functions, without plug-in, in models having weakly dependent station- 
ary observations. The maximum EL estimator under blocking is shown to 
have greater efficiency than the standard maximum EL estimator, but the 
blockwise approach has not been extended to allow plug-in. Standard EL 
(with plug- in), however, can still provide accurate confidence sets under 
dependent observations, for according to Theorem 2.1 the limiting distri- 
bution of the standard EL statistic, while not chi-square, is of a tractable 
form. If m n = m/y/n and there is no plug-in, conditions (Al) and (A2) can 
be checked by central limit theorems and ergodic theorems for weakly de- 
pendent sequences. Condition (A3) holds provided E||m(X, #o)|| 2 < cxd by a 
Borel-Cantelli argument [cf. Owen (2001), Lemma 11.2]. For an estimating 
function m(X, 9) such that Em (A, 9q) =0, the limiting distribution of the 
EL statistic is as in Remark 2.2 with V\ = Cov{m(Xi,9o),m(Xi,9o)} 
and V2 = Var{m(A, 9q)}, which could be estimated easily. 
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Remark 2.5. Nordman, Sibbertsen and Lahiri (2007) develop block- 
wise EL for the mean of the long-range dependent (stationary and ergodic) 
process X\ = G(Zi), where {Zi} is a stationary sequence of N(0, 1) random 
variables such that cov(Zj, Zi +n ) = n~ a L(n), for some < a < 1 and slowly 
varying L(-), and G(-) is a Borel function with G(Z\) having finite mean 9$ 
and finite variance a 2 . Suppose that a, L(-) and G(-) — 9q are known and 
we use an estimating function of the form m n (Xi,9) = b n {Xi — 9), where 
b n depends on the rate of convergence of the sample mean of the JQ. Con- 
dition (Al) is checked using a result of Taqqu (1975), which shows that 
b n EiUPQ - 0o) ->d U if we specify b n = n a / 2 " 1 L(n)- 1 /2. Here U is defined 
by a multiple Wiener integral and does not depend on 0q. Condition (A2) is 
checked by setting a n = n~ l b~ 2 = n l ~ a L{n) and using the ergodic theorem: 

n n 

a n m n {X h 9 ) 2 = n^J2(Xi ~ ^? ^ = V 2 . 

i=l i=l 

In this case the choice of a n tends to infinity, and it is not possible to arrange 
a n = 1. 

Remark 2.6. In the special case that the nuisance parameter h is finite 
dimensional, the profile EL statistic 

-21og|maxEL n (6» ,/i)/ maxEL n (0, /i)j ->- d \\ 

under various regularity conditions [Qin and Lawless (1994), Corollary 5], 
where q is the dimension of 9. This provides an attractive method of ob- 
taining an EL confidence region for 9, and is easier than using plug-in, but 
it is restricted to finite-dimensional nuisance parameters and the estimat- 
ing function needs to be differentiable in (9,h). Bertail (2006) extended this 
approach to infinite-dimensional h in some "highly smooth" cases (cf. Re- 
mark 2.3). 

Remark 2.7. Our proof of Theorem 2.1 differs from the usual EL ap- 
proach in that we take the dual problem perspective; see, for example, 
Christianini and Shawe- Taylor (2000), Section 5.2, for the relevant convex 
optimization theory. An outline of the proof is as follows. Write X n , = 
m n (Xi,9o,h). By (AO), with probability tending to 1, EL n = EL n (8o,h) = 
Iir=i(l + ^ t A n) j) _1 , where the p-vector of Lagrange multipliers A satisfies 
Y,i =1 X n>i /(l + X t X n>i ) = 0, as in Owen (2001), page 219. Thus, with prob- 
ability tending to 1, we can express the EL statistic in dual form as 



(1) 



-21ogEL n = G n (A)=supG n (A), 
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where G n (X) = 2£)£_ 1 log(l + X t X riii ), and the domain of G n is the set on 
which it is defined (regarding logx as undefined for x < 0). Note here that G n 
is concave and achieves its maximum at A since VG n (A) = 0. Now consider 
the following quadratic approximation to G n : 

n n 

G* n (A) = 2X t U n - X t V n X where U n = ]T X Hyi , V n = J2 X ®1 

i=X i=l 

and the domain of G* is taken as the whole of ffi p . We show in our Appendix 
that the difference between the maxima of G n and G* n (over their respective 
domains) is of order o pr (a n ). Thus, by (1) and the fact that G* is maximized 
at A* = V~ l U n when V n is invertible (which happens with probability tend- 
ing to 1), it follows that 

(2) -2a" 1 logEL n = a" 1 supG;(A) + o pr (l) = C^OnK)" 1 ^ + o pr (l), 

A 

which tends in distribution to C/'V^ -1 ?/, via assumptions (Al) and (A2). 
It also follows from the proof that Theorem 2.1 continues to hold in cases 
where (U n ,V n ) — ^ (U, V2), with a random rather than a fixed V2. 

2.3. Bootstrap calibration. As mentioned above, the estimation of V\ can 
be difficult in certain situations and, more seriously, U may not be normally 
distributed, in which case a bootstrap calibration is desirable. The procedure 
developed below consists in replacing U by a bootstrap approximation, and 
in consistently estimating Vi- 

We restrict attention to i.i.d. data and m n = m/y/n. Assume that M(9, ho) = 
if and only if 9 = (9 , where M(0, h) = Em(A, 9, h), and denote M n (9, h) = 
n 1 Ya=i m(Xi, 9, h). Let {Xi,...,X*} be drawn randomly with replace- 
ment from {X\, . . . , X n }, let h* be the same as h but based on the bootstrap 
data, and define M*(9, h) = n~ l X^=i m (^Q*> ^1 h). Also, let 9 be a consistent 
estimator of 9 , and % = n~ l E?=i m® 2 (Xi,9,h). 

We use the abbreviated notation A n = M n — M, as a function of (6,h), 
and A* denotes the bootstrap version of A n (here and in the sequel we define 
the bootstrap version of any statistic as the expression obtained by replac- 
ing M,M n ,9o,hQ and h by M n , M*,9,h and h* , resp.). Let 7i be a vector 
space of functions endowed with a pseudo- metric || • which is a sup- norm 
metric with respect to the ^-argument and a pseudo-metric with respect to 
all the other arguments. Also let $ n = \/n{&. n (9o, ho) + T(9o,ho)[h — ho]}, 
where T(9o,ho)[h — ho] is the Gateaux derivative of M(9o,ho) in the direc- 
tion h — ho [see, e.g., Bickel, Klaassen, Ritov and Wellner (1993), page 453]. 
The bootstrap analogue of $ n is denoted by $* . Finally, let P* denote the 
bootstrap distribution conditional on the data. The following conditions are 
needed to formulate the validity of the bootstrap approximation: 
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(Bl) su PtmP \P*{<S>* n <t}- P{$ n < t}\ - pr 0. 
(B2) 

sn V\\6-6o\\<5 n ,\\h-h \\H<8n \\^n(^,h) — A n (9o, ho)\\ — O pr (n l / 2 ) for all 

Sn 10. 

(B3) \\M(9 ,h) - M(0 o ,h o ) - T(9 ,h )[h - h }\\ < c\\h - h Q \\^ for some 
c>0. 

(B4) \\h-h \\n = o P r(n" 1 / A ). 

(B5) The bootstrap analogues of conditions (B2)-(B4) hold pr-a.s. 

Theorem 2.2. Under conditions (A0)-(A5) and (B1)-(B5), 
sup \P*{n[M*(9,h*) - M n (9, tyfVf 1 [M*(9, h*) - M n (9,h)} < t] 

- P{-2logEL n (9 ,h) < t}\ ^ pr 0. 

Remark 2.8. When 9 is defined as the minimizer of ||M n (0,/i)||, suf- 
ficient conditions for 9 to be consistent can be found in Theorem 1 in 
Chen, Linton and Van Keilegom (2003). In order to verify condition (B2) in 
the case of i.i.d. observations, it suffices by Corollary 2.3.12 in 
van der Vaart and Wellner (1996) to show that the class {m(-, 6, h) : 9 € 
@,h£ 7i} is Donsker, and that 

Vav{m(X,0,h)-m(X,0o,h o )}<K 1 \\9-9o\\+K 2 \\h-h o \\ H + e n 

for some K\,Ki > 0, and for some e n [ 0. The former condition can be 
verified by making use of Theorem 3 in Chen, Linton and Van Keilegom 
(2003). The bootstrap analogue of (B2) then follows from Gine and Zinn 
(1990), provided 

Vav*{m(X*,9,h) -m(X*,9,h)} < K[\\9 - 9\\ + K' 2 \\h -h\\ n + e' n 

for some K'^K^ = 0(1) a.s. and for some e' n = o(l) a.s. Finally, condition 
(B3) and its bootstrap version can often be verified by using a two-term 
Taylor expansion of M{0q,K) and of M(9, h*) around ho and h, respectively. 

3. Applications of the plug-in theory. This section gives six illustrations 
of the preceding plug-in theory. The first uses parametric plug-in for a non- 
parametric estimand while the five others effectively use nonparametric plug- 
in to solve nonparametric empirical likelihood problems. The last two are 
examples of situations where the rate of convergence of the estimator of 9q 
is slower than the usual root-n rate. All the examples use a n = 1. 

3.1. Symmetric distribution functions. Let F be a continuous distribu- 
tion function of a random variable X, that is symmetric about an unknown 
location a, so F(x) = 1 — F(2a — x) for all x. Consider estimation of 0q = P( x ) 
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at a fixed x from n i.i.d. observations from F. The estimating function has 
p = 2 components (the first being the usual estimating function and the 
second making use of the symmetry assumption): m n = n _1 / 2 m, with 



m(X, 6, a) 



f I{X < x} - I 
\I{X >2a-x} 



The plug-in estimator of a is taken as the sample median a. Let r]o 
min(6*o, 1 — 9q) and suppose < do < 1. Condition (A2) holds and 



- v 2 60(1-60) 



when 9q 1/2, and Vi is singular when do = 1/2. A consistent estimator of 
V2 is obtained by replacing 60 by F(x), where F is the empirical distribution 
function of X. The validity of condition (A3) is straightforward. Now, let us 
turn to condition (Al). First note that 

v^{l -F{2a-x) -9o) 

= v^{l - F(2a - x) - F(2a - x) + F(2a - x) - 6 } + o P (l) 

= v^{l - F(2a - x) - 6 } - 2/(2a - x)^{a - a) + o P (l) 

= y^{l - F(2a - x) - 60} - 2/(x)/(a)" 1 v ^{F(a) - 1/2} + o P (l) 

provided /(a) > 0, and hence n~ 1 ^ 2 J2?=i m (^i^6o,0') is asymptotically nor- 
mal from the Cramer- Wold device and the central limit theorem. It is easily 
seen that the asymptotic variance matrix V± is given by 



V x 



6o(l-6o), -r] 2 o-f(x)f(ar 1 m 
-ril - f(x)f(a)- l <no, 6o(l - do) + f(x) 2 f(a)~ 2 + 2f(x)f(a)- l Vo 



The elements of this matrix can be estimated by replacing #0 by F(x) and 
plugging in kernel estimators for f(x) and f(a). 

Finally, we check condition (AO) when <6o < 1/2; the case 1/2 < 60 < 1 
is similar. We need to show that P{(0, 0)* G C n } — > 1. First, P{a > x} — > 1 so 
we can condition on the event that a > x. Next, note that m(X, 9o,a) takes 
only three possible values: 



6o) ( -60 \ Qr (-60 

% r vi-^y \-9o 



each with positive probability. It can be easily seen that the origin (0, 0) is 
contained in the interior of the convex hull of these three points, from which 
the assertion follows. 
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3.2. Integral of squared densities. Let X\, . . . ,X n be i.i.d. from an un- 
known density /o which is assumed to be uniformly continuous and nonuni- 
form. The quantity 9q = J /g dx is of interest for various problems related 
to nonparametric density estimation. The limit distribution of the Hodges- 
Lehmann estimator of location has variance proportional to 1 / 6*q ; see Lehmann 
(1983), page 383. Similarly, the power of the Wilcoxon rank test is essentially 
determined by the size of 9q; see Lehmann (1975), page 72. 

Consider the estimating function m(A, 9, f) = f{X) — 9 and let m n = 
n~^l 2 m. As a plug-in for fo, we employ a kernel density estimator f(x) = 
n" 1 Ya=i kb{Xi — x), where kb(-) = k(-/b)/b is a scaled version of a symmetric 
and bounded kernel function k using bandwidth b = b n . [For discussion of 
methods for deciding on good kernel bandwidths, when the specific purpose 
is precise estimation of 9q, see Schweder (1975).] Define 

V = j{h- 8 ) 2 f dx = J fldx- (y f 2 dx) , 

which is the asymptotic variance of n~ 1 / 2 J2i=l m (Ai, #0) /o)> an d is positive 
since fo is nonuniform. We now show that (A2) holds with V2 = V. Write 

n n „ 

n- 1 Y, m2 (^GoJ) = n- 1 J2{f(^)-o } 2 = / f 2 dF-2e e + el 

i=l i=l J 

in terms of the empirical distribution function F and 6 = n~ l J27=i f(-^-i) = 
J fdF. Then J f dF and / f 2 dF have the required limits in probability, 
/ /o dx and / /q dx, respectively, provided b — ► and nb — ► 00. This verifies 
(A2). 

Checking (Al) requires a more precise study of 

e = n- 1 f; f{Xi) = n- 2 x: *fc(*< - A,) = M + ^zl g . 

r-f — nb n 

i=l 1,3 

Here g = g(0), where g{y) = 1 J2i<j k~b(Yi,j,y) is a natural kernel estima- 
tor of the density g(y) = J f(y + x)f(x) dx of the difference Y^j = Xi — Xj, 
and kb{Yi,j,y) = \{kb(Xi,j —y) + kb(Yi : j + y)}. Hjort (1999), Section 7, shows 
that g(y) has mean value g(y) + \b 2 g"(y) J u 2 k(u) du + o(b 2 ), with variance 
(4/n){<7* (y) — ff(y) 2 } plus smaller order terms, where <7*(y) = (l/4){y(y,y) + 
9~{y,-y) +g(-y,y) +g(-y,-y)} and 3(2/1,2/2) is the joint density of two re- 
lated differences (X2 — X±, A3 — Ai). It follows that 

n 

n- 1 / 2 £ m(X i3 e , J) = Vn~(9 - 8 ) 

i=l 

has mean of order 0(l/(y/nb) + yjnb 2 ) and variance going to AV. This, in 
conjunction with the asymptotic theory of U-statistics, verifies (Al) with 
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U ~ N(0, 4V), under the conditions yjnb — > oo and ^frib 2 — ► 0. (If 6 = bon~ a , 
we need | < a < |.) For (A3), note that /(x) < 6~ 1 A; max for all x, where 
kmax is the maximum of k(u). Hence maxj<„ \f(Xi) — 9q\ is bounded by 
fr _1 ^max + #o> which implies (A3), provided only that y/nb — > oo. 
Finally, for (AO) we need to show that 

P< min m(Xi,0 o ,f) < < max m(Xi,9 ,f) \ — ► 1. 

First, consider 

max m(Xi,9 ,f) > max / (Xj) - max \f(Xi) -fo(Xi)\-0 o . 

l<i<n l<i<n l<i<n 

Note that maxi<j< n — /opQ)| — > a.s. by the uniform consistency 

of /, which holds for b as above (and suitable kernels k) by Theorem A 
of Silverman (1978), where we have used the assumption that /o is uni- 
formly continuous. An example of a suitable kernel is the standard nor- 
mal density function. Also, maxi<j< n fo(Xi) — > a . s . sup t /o(i) > #o> since fo 
is continuous and nonuniform, so P{maxi<j< n m(Xj, 9q, f) > 0} -> 1. In a 
similar way we can consider mini<j< n m(Xi, 9q, f). We may now conclude 
that -21ogEL n (# ,/)^d4x?. 

3.3. Functionals of survival distributions. Wang and Jing (2001) (hence- 
forth WJ) developed a plug-in version of EL for a class of functionals of 
a survival function (including its mean) in the presence of censoring. De- 
note the survival and censoring distribution functions by F and G, re- 
spectively. The parameter of interest is a linear functional of F of the 
form 8 = 9(F) = dF(t), where is a (known) nonnegative mea- 

surable function and 9(F) is assumed finite. The estimating function is 
m n = n~ l / 2 vn, with 

m(Z,A,9,G)- C{Z)A - 



1 - G(Z) 

Z = min(X,Y), A = I{X < Y}, Y ~ G. Here X ~ F and Y ~ G are as- 
sumed to be independent. The Kaplan-Meier estimator G n of the censoring 
distribution function G plays the role of the plug- in estimator. The resulting 
estimator 9 of 9q takes the form of an inverse-probability-weighted average. 
Equivalently, 9 = 9(F n ), where F n is the Kaplan-Meier estimator of F; see 
Satten and Datta (2001) for further discussion and references. 

The conditions (A0)-(A3) needed to apply Theorem 2.1 are now checked 
by referring to various parts of WJ's proof of their Theorem 2.1, the condi- 
tions of which we assume implicitly. For (AO) we need to make the further 



EMPIRICAL LIKELIHOOD 



13 



mild assumption that the distribution of £(X) is nondegenerate (i.e., not 
concentrated at its mean 9$). Then, 

max m(Zi, Aj, 6q, G n ) > max £(Zj) Aj - 6q, 

l<i<n l<i<n 

which is strictly positive for n sufficiently large a.s. Also, 
min m(Zi,Ai,8 ,G n ) = -9o<0 

l<i<n 

for n sufficiently large a.s. This, together with the lower bound for the maxi- 
mum, entails (AO). Condition (Al) is immediate from the lemma on page 524 
of WJ, with U ~ N(0, V\) and V\ being the asymptotic variance of 9. Con- 
dition (A2) is checked using a Glivenko-Cantelli argument almost identical 
to that used below for estimation of V2, where Vi = Em 2 (Z,A,9o,G) < 00 
by condition (C3) of WJ. Condition (A3) is the display immediately before 
(4.5) in WJ. 

It remains to provide consistent estimators of V\ and V2, and we do this 
along the lines of Remark 2.2. Stute's (1996) jackknife estimator can be used 
for V\. Under conditions (A4)-(A5), we have that V 2 = n' 1 Ya=\ m 2 (Zi, Aj, 
9,G n ) consistently estimates V2, where we also use the consistency of 9. To 
check (A4), assume that G(th—) < 1, where th = inf{t :H(t) = 1}, and H 
is the distribution function of Z. Choose a constant c such that G{th—) < 
c < 1. Specify 7i as the class of increasing nonnegative functions h such that 
h{jH—) < c and h(t) = h(Tn) for t > th- Now, sup 0<t<TH \G n (t) — G(t)\ is 
bounded by 

sup \G n (t) - G{t A Z (n) )\ + sup \G(tAZ (n) )-G(t)\ 

0<t<T H 0<t<T H 

= sup \G n (t) - G(t)\ + sup \G(Z (n) )-G(t)\^ pv 0, 

0<*<^(„) Z( n )<t<T H 

by uniform consistency of G n on the interval [0, Z( n )]; see Wang (1987). 

Thus P{G n G TL} = P{G n {r H -) < C }->1. The class {1/(1 - h) : h G H} is 
contained in the class of all monotone functions into [0, 1/(1 — c)], which is 
Glivenko-Cantelli; see van der Vaart and Wellner (1996), page 149. Thus, 
using the preservation property of Glivenko-Cantelli classes under a con- 
tinuous function [see van der Vaart and Wellner (2000)], it follows that 
defined right after conditions (A4) and (A5), is Glivenko-Cantelli. Condition 
(A5) follows by noting that E\m 2 (Z,9,h) — m 2 (Z,6o,h)\ is bounded above 
by 

E(\m(Z,9,h)-m(Z, 9 ,h)\\m(Z,9,h) +m(Z, 9 ,h)\) 
<\\9-9 \\{\\e + 9 \\+2E\aZ)\/(l-c)} 

for heH. 
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3.4. Error distributions in nonparametric regression. Consider the model 
Y = n(X) +e, where X and e are independent, e has unknown distribution 
function F £ , and //(•) is an unknown regression function. We now use our 
approach with bootstrap calibration to construct an EL confidence inter- 
val for 6q = F s (z) G (0,1), at a fixed point z. The same assumptions as in 
Akritas and Van Keilegom (2001) are imposed. In particular, F e is assumed 
to be continuous, /x(-) is smooth and X is bounded. For simplicity we restrict 
X to (0,1). 

Consider the Nadaraya-Watson estimator ji(x) = J2?=i W n ^(x; b n )Yi, with 
weights W nt i(x; b n ) = kb >x (Xi) / 2j=i kb >x (Xj) in terms of a kernel function k 
and scaled versions kb, x (u) = b" l k((u — x)/b) thereof, with b = b n = b^n" 2 ! 1 
a bandwidth sequence (other choices of the bandwidth are possible). The 
estimating function is m n = n~ l l 2 m, where m(X,Y,6, n) = I{Y — n(X) < 
z}-9. 

We now check the conditions of Theorem 2.1. First, (Al) follows from the 
asymptotic normality of 9 = n~ l Ya=i ^i^i — z } [with i~i = Y{ — p,(Xi)] , given 
by Theorem 2 in Akritas and Van Keilegom (2001): y/n{F e (z) — F e (z)} = 
n -i/2 J2^ = i m(Xi, Yi,0Q,fl) —*dN(0,Vi) where V\ is defined in their paper. 
Condition (A2) holds with V 2 = 9 (1 - O ), provided < 9 < 1. Also, (A3) 
holds since the function yfnm n is uniformly bounded by 1. Finally, (AO) is 
an immediate consequence of the fact that P{Y — ji(X) < z} (probability 
conditionally on the function ju) converges to F e (z), which follows from a 
Taylor expansion and the uniform consistency of Ji. Since F e (z) is strictly 
between and 1, it follows that 

Pjthere exist 1 < i,j < n such that Y{ — ju(Xj) < z and Yj — fl(Xj) > z} — > 1, 
which yields (AO). 

It remains to estimate V\ and Vi- Note that V 2 = 9(1 — 9) consistently 
estimates V 2 - However, V\ is harder to estimate. A plug-in type estima- 
tor can be obtained by making use of the estimator of the error density 
in Van Keilegom and Veraverbeke (2002). Since this approach requires the 
selection of a new bandwidth, we prefer to use the bootstrap approach. We 
now check the conditions of Theorem 2.2. For (A4), set 5 > and define 

C 1+5 {0, 1) = {differentiable / : (0, 1) -> K, such that \\f\\ 1+s < 1}, 



where 



r„,„ II fht x ^ \f'(x)-f'(y)\ 

_a =max{||/|| 0O ,||/ ||oo}+sup — -rg 



*,v \x-y\ 

and || • | |oo denotes the supremum norm. Careful examination of the proof of 
Lemma 1 in Akritas and Van Keilegom (2001) reveals that the class {I(e < 
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z + f{X)) : f G C 1+s (0, 1)} is Donsker, which is, using the notation of that 
proof, equal to the class T\ with d<i = 1 and z fixed. Therefore, also the class 

{/(e < z + f{X)) -e-.fe C 1+s (0, 1), 9 g [0, l]} 

= {I{Y - h{X) <z}-8:heH,6e[0,l}} 

is Donsker, and hence Glivenko-Cantelli, where TL = TL = \i + C 1+5 (0, 1), 
and is endowed with the supremum norm. As a consequence, the class 
J-, defined right after (A4) and (A5), is also Glivenko-Cantelli. Moreover, 
P{fi G TL} — > 1 by Propositions 3-5 in Akritas and Van Keilegom (2001). 
Condition (A5) is satisfied since for any 8 n { 0, 

sup \Em 2 (X,Y,e,h) -Em 2 (X,Y,6 ,h)\ 

\e-e Q \<8 n ,hen 

< 5 n sup E\2I{Y - h(X) < z} - 9 - 9 \ -»■ 0. 

\e-e \<6 n ,h€H 

Next, let us calculate T(0, h)[h — h] for any h,h£TL. We find 
lim{M (0, h + T{h- h)) - M(9, h)}/r 

T— >0 



lim t 

T^0 



'F Y \ x (z + + r(/i(x) - h(x))) - F Y \ x {z + h(x))] dF x (x) 



f Y \ x {z + h{x)){h{x) - h{x)) dF x {x), 

where Fy\ x and fy\ x are the distribution and density function of Y given 
X = x, and Fx is the distribution function of X. Consequently, 



4>, 



n 



l Y^i{Y i -ii(x i )<z}-e 

i=l 

„ n 

1 J fv\ x (z + H(z)) ~ E{A; M (X)y}) dx 



i=l 



(3) 



+ n 

+ Opr(l) 

n 



n 



i=l 



+ x/n 



n 



i=l 



+ o pr (l). 
In a similar way, we obtain 



$* 



71 



1 e n*? - < z] - n- 1 e w - n(Xi) < z] 



i=i 
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(4) 



+ y/n 



n 



- X Y,fr\xt(* + KXi))Yi 



E*[f Ylx *(z + jl(X*))Y* 



+ o P .(l). 



Both (3) and (4) converge to zero-mean normal random variables [use, 
e.g., the Lindeberg condition to show the convergence of (4)]. We next show 
that the asymptotic variance of (4) converges in probability to the asymp- 
totic variance of (3). To show this we restrict attention to the first term of 
(3) and (4) (the convergence of the variance of the second term and of the 
covariance between the two terms can be established in a similar way). Note 
that the variance of the first term of (3) respectively (4) equals 9q(1 — 6q) 
respectively n" 1 ^/^ " < z}[l - n^ELi W " K x i) < *}]■ 

Since it follows from Lemma 1 in Akritas and Van Keilegom (2001) that 



-1/2N 



nr 1 Y J I{Y l - ft(Xi) <z} = 9 + Y,[I{Yi ~ n(Xi) < z] - 9 ] 

i=l i=l 

+ P{Y - (2(X) <z\Jl}-e + o pr (n 
= 9 + o pT (l), 

the result follows. Hence, (Bl) is satisfied. For (B2) it suffices by Remark 2.8 
to show that the class {I{Y - h(X) < z} - 9 : < 9 < 1, h G H] is Donsker, 
which we have already established before, and that 

Var[/{Y - h{X) < z} - I{Y - fjt(X) <z}-9 + 9 ] 

is bounded by K\\9 — 9q\ + K^^h — ^||oo for some K\,Ki > 0. A similar 
derivation can be given for the bootstrap analogue of (B2). Next write 

|M(0 o ,/2)-r(0o,M)[£-M]| 



P{Y-MX)<z} 



e o~ J fv\x( z + K x ))iM x ) - ^(x)} dF x {x) 



[ F Y\x{z + K X )) ~ F Y\x(.Z + H(x)) 

- fv\x{z + n{x)){p,(x) - fi(x)}] dF x (x) 



fy\ x {z + &)){fi(x) - v{x)f dF x (x)\ < Ksnp\fl(x) - /i(x)| 2 , 



for some £(x) between ji{x) and /2(x), and for some positive K. This shows 
that (B3) holds. In a similar way, the bootstrap version of (B3) can be shown 
to hold. Finally, condition (B4) follows from, for example, 
Hardle, Janssen and Serfling (1988), and its bootstrap version can be estab- 
lished in a very similar way. It now follows that a 100(1 — a)% confidence 
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interval for F £ {z) is given by {9: — 21ogEL n (0,/2) > e\_ a } : where e\_ a is the 
100(1 — a)% percentile of the distribution of 



n 



1 Y J i{Y*-'[f{xt)<z}-e 



n 

i=i 



'{6(1-9)}. 



3.5. Density estimation. Let X±, . . . ,X n be i.i.d. from an unknown den- 
sity /o, and suppose we are interested in estimating 9q = fo(t), for t fixed. 
We do this using the kernel density estimator f n (t) = w -1 X}£=i &6pQ ~~ t), 
where kb(u) = b~ 1 k(b~ 1 u) is a 6-scaled version of a symmetric, bounded ker- 
nel function k, supported on [—1,1]. We choose here to employ bandwidths 
b = b n that satisfy nb — > oo and nb 5 — > 0. The rate b = cn~ 1 / 5 (for some c > 0) 
is optimal for estimating fo(t), in the sense of minimizing the asymptotic 
mean squared error, but as we here aim at constructing confidence intervals, 
an undersmoothing rate is preferable. Hall and Owen (1993) constructed EL 
confidence bands for /o, and Chen (1996) showed that the pointwise EL con- 
fidence intervals (with and without Bartlett correction) are more accurate 
than those based on the bootstrap. 

Following these authors, we use the sequence of estimating functions 
m n (x,9) = n _1 / 2 6 1//2 {/cfe(x — t) — 9}, which do not involve plug-in. 
We now check the conditions of Theorem 2.1. For (AO), note that 
v / nfe~ 1/2 mini< i < n ?n n (Xj,6'o) = -0q < 0, and 

^frOcT 1 ! 2 max m n (Xi,9 ) = max \k( \ ^ - 9 -> a .s. oo 

l<i<n l<j<n \ J 

provided /o is bounded away from in a neighborhood of t. Condition (Al) 
can be checked under mild conditions on the density, as it follows from stan- 
dard asymptotic theory for kernel density estimators that Yli=i m n(Xi,9o) = 
{nb) l / 2 {J n {t) - f (t)} tends to N(0, Vi), where 



(5) V 1 = f (t)R(k) and R(k) = J k(u) 2 du. 

For (A2), 

n j n 1 n 

J2 ml(Xi,e ) = - J2i k b(Xi ~t)- Oo} 2 = -r £ " t)/b) 2 + O pi (b), 

1 Tl . TIO . 

t=l 1=1 1=1 

which converges to fo(t)R(k) = V\ in probability. For (A3), maxj< n \ m n (Xi, 
9q)\ = 0((n6) -1 / 2 ) = o(l), because k is bounded and nb— > oo. 

3.6. Survival function estimation for current status data. Suppose there 
is a failure time of interest T ~ F, with survival function S = 1 — F and 
density /, but we only get to observe Z = (C, A), where A = I{T < C} and 
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C ~ G is an independent check-up time (with density g). The observations 
are assumed to be i.i.d. 

The nonparametric maximum likelihood estimator S n (t) of S(t) exists. 
Groeneboom (1987) showed that n l / 3 {S n (t) - S(t)} converges to a nonde- 
generate limit law. The limit is not distribution- free, however, and is un- 
suitable for providing a confidence region for S(t). Banerjee and Wellner 
(2005) found a universal limit law for the likelihood ratio statistic, leading 
to tractable confidence intervals. Our approach based on estimating equa- 
tions offers a simpler type of EL confidence region, and extends to the setting 
in which T and C are conditionally independent given a covariate (although 
for simplicity we restrict attention to the case of no covariates). 

First consider estimation of a smooth functional of S (such as its mean): 
#o = Jq 00 k(u)S(u) du, where k : [0, oo) — > M. is fixed. This parameter can be 
estimated at a y/n-r&te, m n (Z, 9, F, g, k) = n~ 1 ^ 2 m(Z, 9, F, g, k) is an efficient 
influence curve, where 



m(Z,6,F,g,k) 



fc(C)(l-A) fl 

k(C){l-F(C)} 
9(C) 



roc 

+ / k(u){l- F(u)}du, 
Jo 



and, given suitable preliminary estimators F and g of F and g, respectively, 
we have a plug-in estimating function m(Z, 9, F, g, k) that yields a consistent 
estimator of 9q when either F or g is consistent; see van der Laan and Robins 
(1998). 

Now consider estimation of < 9q = S(t) < 1. Van der Vaart and van 
der Laan (2006) introduced a kernel-type estimator S nt b(t) and showed that 
^ 1 ^ 3 {5 , n ,6(t) — S(t)} — >dN(0,Vi), for appropriate and positive V\. Their ap- 
proach is to replace k above by k n = kbj, a kernel function of bandwidth 
b = b n = bon" 1 / 3 centered at t. Here kbt(u) = k((u — t)/b)/b in terms of a 
bounded density k supported on [—1,1]. This yields a sequence of (plug-in) 
estimating functions m n (Z, 9, F, g) = n~ 2 / 3 m(Z, 9, F, g, k n ), and the estima- 
tor is written as S n< b(t) =¥ n ip(F,g,k n ), where P n is the empirical measure, 
and ip(F,g,k n )(Z) = m(Z,Q,F,g,k n ) is the influence curve. The asymptotic 
variance of S n>b (t) is Vi = b^ l a 2 R(k), where R(k) is as in (5) and a 2 depends 
on F and g, as well as on the limits of g and F. 

We adopt the same assumptions as van der Vaart and van der Laan. In 
particular, assume that F is differentiable at t, and g is twice continuously 
differentiable and bounded away from zero in a neighborhood of t. Also, g 
and F are assumed to belong to classes of functions having uniform entropy 
of order (l/e) v , for some V < 2, with probability tending to 1, and g, or F, 
or both, are locally consistent at t. 
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Our result for estimating functions with plug-in gives 

-2\ogEL n (S(t),F,g,k n )^ d xl 

Conditions (A0)-(A3) are easily checked by referring to van der Vaart and 
van der Laan's Theorem 2.1 and its proof. For (AO), note that 

n 2 / 3 m n (Z h 9 ,F,g) = %^(F(Ci) - A») 



+ 



poo 

/ k n (u){l-F(u)}du-e c 
Jo 



The minimum and maximum over i < n of the first term above tend a.s. 
to — oo and +00, respectively, since < P(A = 1) < 1 and it is assumed 
that g is bounded away from in a neighborhood of t. The second term 
above stays bounded as n tends to infinity, so (AO) holds. Next, note that 
E?=i m n (Z h 8 , F,g) = n l l*{S ntb (t) - S(t)}, so (Al) holds [with Vt given by 
the asymptotic variance of S n fi(t)]. For (A2), note that 

n 

J2ml(Z i ,6 ,F,g)=n- 1 / 3 F n {i;(F,g,k n ) - S(t)} 2 

4 = 1 

= n'^FMF, g, k n ) - P^(F, g, k n )} 2 

(6) 

+ 2n- 1 ^{S ntb (t) - S(t)}{P^(F, g,k n )- S(t)} 



n 



^ 3 {PiJj(F,g,k n )-S(t)} 2 . 



The last two terms above are asymptotically negligible, by the usual argu- 
ment for controlling the bias of a kernel estimator; see the start of the proof 
of Theorem 2.1 of van der Vaart and van der Laan. To handle the first term, 
the influence function ip is split into a sum of two terms ip\ and ip2 , where 

/•OO 

MF,g,kn)(Z)= k n {u){l-F{u)}du 



does not give any contribution in the limit. In our case, ^2 acts as a constant 
function (there are no covariates), so the first term in (6) with ip replaced by 
-02 is 0(n~ l l 3 ). The first term of (6) with ij) replaced by tpi can be expressed 
as 

(7) n-^irr^&nHn) + n^ 3 PH n , 

where G n = y / n(P n — P) is the empirical process and 

H n (F,g,k n ){-) ={MF,g,k n ) - PMF,9,k n )} 2 - 

Applying the part of their proof that deals with tpi, but with ipi replaced 
by H n and n _1 / 2 fc 2 as the envelope functions, shows that n~ l l 2 G n H n is 
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asymptotically tight. They also show that n _1//3 PH n — > pr b$ 1 a 2 R(k), with 
R(k) as in (5). Thus, only the second term in (7) gives a contribution in the 
limit, and we have 

n 

J2ml(Zi,9 ,F,g) - pr b^a 2 R{k) = Vi, 

i=l 

establishing (A2) with V2 = Vi- Finally, (A3) is checked using the assumption 
that g is bounded away from zero in a fixed neighborhood of t. Note that 
k n < c6~ 1 l[ t _b Ili f+6 n ] for some constant c, so 

max \m n (Zi,6o,F,g)\ = O pr (n~ 1/3 ) = o pr (l). 

4. Empirical likelihood asymptotics with growing dimensions. The tra- 
ditional empirical likelihood theory works for a fixed number of estimating 
functions p, or, when estimating a mean, for data having a fixed dimension 
d. The present section is concerned with the question of how this theory may 
be extended toward allowing p to increase with growing sample size. Con- 
sider situations with, say, (i-dimensional observations Z\,...,Z n for which 
there are p-dimensional estimating functions m(Zi,9) to help assess a p- 
dimensional parameter 9, and define 

{n n n \ 

J(nu)j) : each Wi > 0, Wi = 1, Wim(Zi,9) = > . 
i=l i=l i=l J 

Thus the framework is "triangular," reflecting a setup where the key quan- 
tities p = p n , d = d n , Zi = Z n ^, 9 = 9 n , m(z, 9) = m n (z, 9) depend on n, but 
where we most of the time do not insist on keeping the extra subscript in 
the notation. A particular example would be p-dimensional Z^s for which 
their mean parameter [i is to be assessed, corresponding to estimation equa- 
tion m(z,fj J ) = z — [i. We allow p to grow with n, and study the problem 
of establishing sufficient conditions under which the standard Xp calibration 
can still be used. There would often be a connection between d and p, and 
indeed sometimes d = p, but the main interplay is between n and p, and we 
do not need to make explicit requirements on d = d n itself. 

We shall use several steps to approximate the EL statistic (8), and ap- 
proximation results will be reached under different sets of conditions. Our 
results and tools for proving them shall involve the quantities 

n n 

(9) I n =n" 1 Vl„ ]i , Sr^n' 1 Vl nii ^ j, D n = max\\X ni \\, 

i=i i=i 

where X n ^ = m(Z n ^,9 n ). Here 9 n is the correct parameter, assumed to be 
properly defined as a function of the underlying distribution of Z n \, • • • , Z n n 
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and the requirement that the mean value of n~ 1 Ya=i m n{Zn,i, On) ls zero 
(stressing in our notation, for this occasion, the dependence on n). We need 
S n to be positive definite, that is, at least p among the n vectors X n ^ are 
linearly independent. In particular, n>p, and p shall in fact have to grow 
somewhat slowly with n in order for our approximation theorems to hold. 

4.1. Main results. At the heart of the standard large-sample EL theorem 
lies the fact that 

(10) T n = -21ogEL n (0 n ) is close to T* = nX^S' 1 ^. 

One may view (10) as half of the story of how the EL behaves for large n 
and p, the other half being how close T* then is to a Xp- A natural aim is 
therefore to secure conditions under which 

(11) (Tn-T^/pVa^o and (T* - p)/{2pf' 2 - d N(0, 1). 

These statements taken together of course imply (T n —p)j (2p) 1 ^ 2 — >d N(0, 1) . 
Even though (T n - p)/(2p) 1 / 2 -+ d N(0, 1) may be achieved without (11), in 
special situations, we consider the quadratic approximation part and parcel 
of the EL distribution theory, and find it natural here to take "EL works for 
large n and p" to mean both parts of (11). 

Various sets of conditions may now be put up to secure (11), depending 
on the nature of the X n ^ of (9). The following result provides an easily 
stated sufficient condition for (11) in the i.i.d. case, and has a number of 
applications that will be discussed in the next section. 

Theorem 4.1. Suppose that the X Ut i's are i.i.d. with mean zero and 
variance matrix T, n . First, if all components of X n ^ are uniformly bounded 
and the eigenvalues of S n stay away from zero and infinity, then p 5 /n — ► 
implies (11). Second, in case the components are not bounded, assume they 
have a uniformly bounded qth moment, for some q > 2, and again that the 
eigenvalues of S n stay away from zero and infinity. Then p s+6 ^ q ~ 2 ^ /n ^ 
implies (11). 

The complete proof of Theorem 4.1 involves separate efforts for the two 
parts of (11), each of interest in its own right. We first explain the main 
ingredients in what makes the first part go through. 

Introduce the random concave functions 

n 

(12) G n (A) = 2^1og(l + A t X n , i / v / ^) and G*(A) = 2\ t ^/nX n — A t S , ri A. 

i=l 

These are similar to the two random functions worked with in Remark 2.7, 
but are here defined in a somewhat different context. It is to be noted that 
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T n of (10) is the same as maxG n = G n (X), say, where the maximizer A also 
is the solution to J27=iX n ,i/(^ + ^X^i/^/ri) = 0. On the other hand, the 
maximizer of G* is A* = S~ l y/nX Tl , and its maximum is precisely T*. While 
G* is defined over all of R p , a little care is required for G n , which is defined 
only where X t X n ^/^/n > —1 for i = 1,. . . ,n. In view of the (p/y / n)D n — > pr 
condition that we nearly always shall impose, the (12) formula for G n holds 
with probability going to 1 for all A of size 0(p). We now provide basic 
"generic form" conditions for the first part of (11) to hold: 

(DO) P{EL n (0 n ) = O}-O. 
(Dl) (pyVnlA^prO. 
(D2) HAHOprCpV 2 ). 
(D3) || A* || =0^/2). 
(D4) maxeig(S n ) = O pr (l). 

Proposition 4.1. Conditions (D0)-(D4) imply (T n - T^/p 1 / 2 -^ pr 0. 
If in addition (p 3 ^ 2 / 'y/n)D n — > pr in (Dl), then T n —T*^ 0. Furthermore, 
for both situations dealt with in Theorem 1^.1, the conditions given there 
imply (D0)-(D4). 

Let us next focus on the second part of (11). Assume there is a population 
version S n of S n and consider T° = nX^Ti^ 1 X n ; when the X n ^ are i.i.d., 
then S n is their variance matrix. Define 

(13) L n = \S n — Y> n \ = max | S n — Y> n j^\. 

When L n is small, a well-behaved S n leads to a well-behaved S n . We note 
that for any unit vector u, {v^SnU — v}Y> n u\ < J2j,k \ u j u k\ L n < pL n , implying 
in particular that the range of eigenvalues for S n is within pL n of the range 
of eigenvalues for E ra . Also, Tr(S n ) is within pL n of Tr(£ n ). Now consider 
the following conditions: 

(D5) p 3 ' 2 L n ^ pr 0. 

(D6) The eigenvalues of T, n stay away from zero and infinity. 

Proposition 4.2. Conditions (D5)-(D6) imply (T* - T^/p 1 / 2 -^ pr 0. 
Furthermore, the assumptions detailed in Theorem 4-1 imply (D5)-(D6), for 
each of the two situations. Also, in the i.i.d. case, provided Ei\X n> i t j\ e stays 
bounded for all components j <p, then the weak condition p/n — > secures 
approximate x^-ness in the sense that (T° — p)/(2p) 1 ^ 2 — >rfN(0, 1). 

While Theorem 4.1 and corollaries indirectly noted above are satisfac- 
tory for several classes of problems, there are other situations of interest 
where the smallest eigenvalues, of S n and S n , go to zero. This will typically 
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lead to condition (D3) failing. For this reason we provide a parallel result 
that demands less regarding the distribution of eigenvalues. For the case 
of i.i.d. variables X n ^ = m n (Zi,9 n ) of mean zero and variance matrix , 

consider X * i = T, n l ' 2 X n> i, and let S* be the empirical variance matrix of 

these, that is, S* = n^EiU z t{ Z if = S^ 1/2 5 n S^ 1/2 . The eigenvalues of 
S* are often more well-behaved than those of S n . 

Proposition 4.3. Consider the EL setup of (8), with m(Zi, /jl) = Zi—fj,, 
for inference about the mean fj, n of Z{. The conclusions of Theorem 4-1 
continue to hold, without the condition on eigenvalues for S n , as long as the 
conditions there are met for the transformed variables Z* ti = S n 1//2 (Z nj j — 

For another remark of relevance, write 71 n and 7 Pjn for the largest and 

smallest eigenvalues of E n . Yet another version of our main result emerges 

1/2 

by dividing the Z^s by 7 P ' n , to avoid small eigenvalues. This gives a parallel 
result to those of Theorem 4.1 and Proposition 4.3, where the essential 
condition is that the ratio Jin/lpn remains bounded. See in this connection 
also Owen [(2001), page 86] where stability of this ratio is crucial also for 
some problems associated with fixed p. 

For the four applications given in Section 5, along with a broad variety 
of others, the above development suffices. There are nevertheless situations 
where further variations on the conditions are required. In the following 
subsection the requirements (D0)-(D6) are discussed and followed up with 
further conditions that suffice for the different requirements to hold. We 
also give some useful lemmas that partly are needed to prove Propositions 
4.1 and 4.2, and hence the master Theorem 4.1, and partly give the oppor- 
tunity to prove versions of (11) under sets of conditions outside those of 
i.i.d. structures, like in regression models. 

4.2. On verifying conditions (D0)-(D6). The EL operation (8) degener- 
ates if zero is outside the convex hull spanned by X n< i, . . . ,X n ^ n in MP. This 
may happen more frequently in higher dimensions. Condition (DO) amounts 
to the EL giving a positive maximum, with probability tending to 1 with 
n, and we now discuss conditions that secure this. That zero is outside the 
convex hull corresponds to there being a unit vector u for which v^Xn^ > 
for each i. So zero is inside the interior of the convex hull if H n (u) < for 
each unit vector u, where H n {u) = minj< n n t X nj j. Thus condition (DO) is 
implied by 

(14) Pi maxH n (u) < > — ► 1 asn^cx), 
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where IA P is the set of unit vectors in MP. This and several later problems will 
be handled separately for two types of situations: (a) the components of X n ^ 
remain uniformly bounded, and (b) the components may be unbounded, but 
reasonable moment conditions prevail. It will be useful to deal with (Dl) in 
connection with (14), that is, (DO). Yet another useful regularity condition 
is as follows. 

(D7) For some q>2, the sequence of E||X nj j/p 1//2 || 9 stays bounded; and 
for this q it holds that p 3+G /^~ 2 ) jn -> 0. 

Lemma 4.1. (a) If the components of X n ^ remain uniformly bounded, 
then p 3 jn — > implies (Dl). (b) If (D7) holds, then again (Dl) holds. 

Lemma 4.2. For the i.i.d. case, assume there exists a positive e such 
that r p (u,e) = P^Xnj > —e} < r < 1 for all u £ U v ; in particular, this 
necessitates a positive lower bound for the eigenvalues o/E n . (a) If the com- 
ponents of X n> i are uniformly bounded, then the requirement [p\ogp)/n— > 
as n — > oo secures (14), that is, (DO), (b) Also (D7) implies (14)- 

Next we assess the sizes of the maximizers A and A* of G n and G* . We 
also need to inspect the size of L n of (13). 

Lemma 4.3. Suppose that y/n\\X n \\ = O pT (p 1 ^ 2 ), that mineig(5 n ) stays 
away from zero in probability, and that (Dl) holds. Then ||A|| = O pr (p 1 ^ 2 ), 
that is, (D2) holds. 

Note for the i.i.d. case, where the X n ^s have a variance matrix S n , then 
ra||X n || 2 is of the required size O pv (p) if only Tr(S n /p) stays bounded. 

Lemma 4.4. For the i.i.d. case, assume that the X n ^j 's have finite qth- 
order moments, for some q > 4, and let A n (p, q) = p^ 1 2~3j=i E|X n) j J |' ? . Then, 
for a positive constant c(q), 

P{L n > e} < - ^j 2 A n (p, q) for each positive e. 

It follows that when qih. moments are bounded, then p 2+4//<J /n — ► se- 
cures pL n — > pr and in its turn well-behaved eigenvalues of S n under min- 
imal conditions on those of E n . Similarly, the p 3+i / q jn — ► condition en- 
sures p 3 / 2 L n — > pr 0, that is, (D5). We note further that when the X n /s are 
uniformly bounded, then p 2 /n — ► implies pL n — > pr 0, whereas p s /n — > 
implies p 3 / 2 L n -^ pr 0. This may be shown using techniques of the proof of 
Lemma 4.4. In situations where the X Ut iS have moments of all orders, the 
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growth conditions here come close to those for the case of bounded vari- 
ables. If they are normal, for example, then ||-Xn,i|| is bounded by a variable 
of the type c(Xp) 1//2 for a suitable c, and one may show that p 3 /n — > and 
p 4 /n — > again suffice for T* — T n being respectively o pr (p 1 ' 2 ) and o pr (l). 

Lemma 4.5. Suppose \\T*\\ = O pT (p) and that mineig(S' n ) stays away 
from zero in probability. Then ||A*|| = O pT (p 1 ^ 2 ), that is, (D3) holds. 

For condition (D4) we note for the i.i.d. case that if maxeig(S n ) is 
bounded, and pL n — > pr 0, then (D4) holds, in view of comments made after 
(13). 

Verifying eigenvalue conditions, for either S n or S n , is sometimes techni- 
cally hard. A theorem of Bai and Yin (1993) works for the case of Zi having 
independent components with zero means and unit variances, in which case 
the linear growth condition p/n — > y E (0, 1) ensures that the smallest and 
largest of the eigenvalues of S n tend a.s. to (1 — ^fy) 2 an d (1 + \fy) 2 '■> respec- 
tively. Inspection of their proof reveals that a version holds also when y = 0, 
namely that the smallest and largest of eigenvalues then tend in probability 
to 1. See also Bai (1999) and the ensuing discussion. 

5. Applications with growing p. This section provides some examples 
where there is a growing number of parameters, and where the theory de- 
veloped in Section 4 guarantees that the empirical likelihood methodology 
still is applicable. 

5.1. Many independent means. Suppose that Z±,...,Z n correspond to 
p independent samples Z±j, . . . , Z n< j, with mean fiQj and standard devia- 
tion o~j, for j = l,...,p, assuming for simplicity of presentation that the 
sample size is the same for each group j = 1, . . . ,p. EL may then be used to 
make simultaneous inference for the vector of mean parameters fio ■ Consider 
the normalized random vector Ui with components — Ho,j ) > which 
has mean zero and variance matrix I p . Results of Section 4 imply that the 
EL works properly, even when p grows, provided p 3 /n — ► and that the 
Ui components stay uniformly bounded, for example, via Proposition 4.3. 
This is secured by the eigenvalue distribution result of Bai and Yin (1993) 
mentioned above. 

Similar results may be reached in other models with a growing number 
of mean type parameters. An example is analysis of variance with a large 
number of groups; cf. Akritas and Arnold (2000). Our theory also supports 
the use of EL theory when multiple comparisons between groups are made, 
since the variance matrix of a collection of such differences of means is well- 
behaved enough to have its eigenvalues away from zero and infinity; that is, 
Theorem 4.1 applies. 
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5.2. Poisson regression. Assume that Y{ given Zi is Poisson with param- 
eter in = exp(zlP), where Zi is a p-dimensional covariate vector and (3 a 
parameter vector of the same length. EL may be used with EL n (/3) defined 
as in (8), via estimating equations J2?=i w i{^i ~ ex P( z tP)} z i = 0. Assume 
that the covariate vectors Zi are i.i.d. from some distribution, which we 
for an easy concrete illustration take to be the standard p-variate normal, 
and let us postulate further that the sequence of f3 vectors is such that 
\\P\\ 2 = Y^j=i P] remains bounded, as n and p are allowed to grow. This fits 
the setup of Section 4 with X n ^ = (Yi — [ii)zi, which have variance matrix 
S n = exp(i|| ( 9|| 2 )(/ p + We see from this that its eigenvalues all lie be- 
tween exp(i||/?|| 2 ) and exp(±||/3|| 2 )(l + ||/3|| 2 ). The conditions of Theorem 
4.1 hold, for each even q, from which we conclude that (11) holds as long as 
p s+e jn — > 0, for some positive e. 

This example may be generalized in various ways. The only point about 
the Np(0, Ip) distributional assumption for the covariates here was to get 
an explicit and easy S n matrix, and variations are easily constructed. Sec- 
ond, results can be derived inside the more usual regression framework where 
z±, . . . , z n are considered known covariate vectors. Basically, this involves the 
variance matrices S n = n -1 Yli=i ^% z i z \ an d S n = J2i=i(Yi — Hi) 2 Ziz\, in 
generalization of those worked with in Section 4. Under a Lindeberg condi- 
tion, combined with the requirement that the \z\f3\ are bounded uniformly 
as p and n grow (which means that all Poisson means should be bounded 
away from zero and infinity), one may prove that conditions (D0)-(D6) are 
fulfilled as long as p 3 /n — > 0, using methods associated with proving Lemmas 
4.1-4.5. Hence the desired conclusion (11) holds. Similar results are reached 
for other generalized linear regression setups. 

5.3. Testing f = fo via orthogonal expansions. For i.i.d. data X\, . . . ,X n 
from an unknown density, consider the growing class of models 



Here /o is a "start density," around which one models a flexible log-linear 
structure for deviations, the tpj functions are orthonormal w.r.t. /o, that 
is, J fotpjipi-dx = 8j ft, and c p is the appropriate normalizing constant. Here 
we can carry out EL analysis for £ = (£i, . . . ,£p)\ where £j = / ftftj dx, and 
a growing p. This is done via the vectors Zi = (ipi(Xi), . . . ^ p (Xi)) t . The 
eigenvalues of its variance matrix will typically be well behaved, with rea- 
sonable conditions on /, and there is stability of fourth-order moments if, 
for example, the tpj 's are bounded. Thus EL theory holds for analysis of the 
£j's, if j> 3 /n — > 0. Consider in particular the problem of testing / = /o, which 
corresponds to the a,'s being zero. The theory of Section 4 ensures that 
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T n = -21ogEL n (0)_= 2££ =1 log(l + X^Zi/y/n) is at most O pr (p 1/2 ) away 
from T* = mjj^S^tpn, where ip n is the vector of averages n~ 1 Ya=i ^ji^i) 
and S n = n _1 Ya=i ^i^h provided only p 3 /n — ► 0, if the tpj's are uniformly 
bounded. This is since the variance matrix under the null hypothesis is sim- 
ply equal to I p . Also, both T n and T* have null distributions close enough 
to a Xpi again by Theorem 4.1. 

5.4. Growing polynomial regression. Consider the regression model 

Yi = £(Xi) + ei for i = l,..., n, 

where the pairs {Xi,Ei) are i.i.d., with X^s coming from some density / and 
the £j's having mean zero and standard deviation o~q. The main objective is 
to make inference about We do not strive for the fullest generality in 
this application of our theory, and are content to work with the following 
scenario: / is known (e.g., the uniform on the unit interval), and £(x) may 
be expanded in terms of basis functions ipo, ifti, ip2> ■ ■ ■ that are orthonormal 
w.r.t. /, that is, / fipjtpk dx = Sj^, and where we take tpo = 1. We might for 
example take ipj(x) = 4>j(F(x)) where the (/>j's are orthogonal w.r.t. the uni- 
form on the unit interval and F the c.d.f. of /. Hence £(x) = J2'j=objtpj(x), 
where we assume that E£(X) 2 = Y^jLobj is finite, and also that £(x) is 
bounded. 

In this setup, consider as pth-order model 

Yi = Cp(Xi) + ej with £ p (x) = ]T brfjix) = {^\x)fb^, 

j=0 

where the residuals are e[ = J2j^= P +i bjipj{Xi) + £{ with variance cr 2 = Oq + 
Sj^=p+i b 2 ; including more terms in the regression structure makes the resid- 
uals smaller in size, and vice versa. Consider Z{ = Yi^)^{Xi), a vector of di- 
mension p + 1, with mean value seen to be We will consider conditions 
under which — 21ogEL n (6^), based on Z\, . . . ,Z n , can be approximated by 
a Xp+i distribution. 

The key to verifying the conditions of Theorem 4.1 lies in controlling the 
sizes of the eigenvalues of the variance matrix of Zi, which may be written 

S n = EY^iX^iXif - &«(&&•))* = oil p + n p , 

where I p and f2 p are of size (p+1) x (p + 1) and where the elements of the 
nonnegative definite O p matrix are j^(x) 2 ^j(x)ipk(x)f(x)dx — bjbk- The 
eigenvalues of S n take the form cr 2 , + 4>j , where the <pj 's are the eigenvalues 
of Q p , and are hence bounded downward by of]. They are also bounded 
upward, since for any unit vector u, v^VLpU is bounded by M 2 J (uoV'o + • • • + 
u p ijj p ) 2 f dx = M 2 , where M bounds |£(sg)|. 
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As explained in the Introduction, we may apply our results to produce 
confidence regions for a subset of the bj parameters, to test whether some 
of them are equal to zero, and to make inference for any linear combination. 
For example, suppose we are interested in a simultaneous confidence region 
for the regression function £ p at certain locations xi,...,x q . Even though 
p may be very large, provided q grows slowly enough with n our results 
apply to the focus parameter cf> = . . . ,^ p {x q )), because (f> = f(b^) 

is a linear function of M p ) and the confidence region can be based on the 
transformed data f{Z{) = (J2^=o^ r i''Pj(^i)' l l J j( x i))i=i,...,q- Focus parameters 
defined by nonlinear functions would also be of interest, but this is beyond 
the scope of the paper, even in the case of a one-dimensional parameter such 
as (f> = maxi= 1) ..., q \£ p (xi)\. 

6. Concluding remarks. 

6.1. Nonstandard limit distributions. Here we give a toy example in 
which T n = — 21ogEL n has a limit distribution different from U t V^~ 1 U in 
Theorem 2.1. Let Xi ~ N(#o, of) be independent, and suppose of < oo. 
Consider the unbiased estimating function m(X,9) = X — 9. Using steps 
from the proof of our Theorem 2.1, it can be shown that T n — >d T, the max- 
imum of the process G(X) = 2J2?^i l°g(l + Ao"jZj) over the random interval 
|A| < 1/D, where D = maxj>i o~i\Zi\ and the Zi are independent standard 
normals. In this case, (A0)-(A2) hold [with a random limit in (A2)], but 
(A3), which is needed to dispose of the remainder term in the quadratic 
approximation to T n , fails, hence the nonstandard limit. 

6.2. Weighted EL. The basic EL setup can be generalized to allow for 
weights. In the framework of Section 2, we can place a weight Tj in front of 
each term m n (Xi,9,h) in EL n (#). This would be useful in situations where 
the Xi's have different precision. Conditions sufficing for — 21ogEL n (6 l o) to 
converge in distribution are readily developed, paralleling (A0)-(A3). 

6.3. Joint convergence of maximum and maximizer. Our proof of The- 
orem 2.1 (in the = 1) shows that T n = sup A G n (A) with probability 
tending to 1, and A = argmax A G n (A) = O pr (l), where G n (A) = 2^ILi log(l + 
A t X ni j). Appealing to Theorem 5.1 of Banerjee and McKeague (2007), we 
can then infer the more general result that (A, T n ) -^ d {V^U^V^U). 
On the computational side, the proof also indicates that maximization or 
equation-solving algorithms should work better with A* = V~ l U n as starting 
point, rather than, for example, zero. 
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APPENDIX 

Here we provide proofs of theorems and claims presented earlier in our 
article. 

Proof of Theorem 2.1. The basic steps and notation of the proof 
were given in Remark 2.7. It remains to show that T n — = o pr (o n ), where 
T n = supG n and T* = supG* . First we determine the stochastic order of A. 
Write A = ||A||tt, in terms of a random unit vector u. As in Owen [(2001), 
page 220] we have 

(15) WMK^VnU - Dnu'Un) < u'Un, 

where D n = maxj< n ||X n> j||. But «V„« > mineig(y n ) = O pr (a~ 1 ), v^Un = 
Opr(l) and DnV^Un = o pr (a~ 1 ), so ||A|| = O pr (a n ). Moreover, A* = V~ U n 
when V n is invertible, so A* is of the same stochastic order O pr (a n ) as A. 

Write log(l + x) = x - \x 2 + \x z h(x), with \h(x)\ < 2 for \x\ < \. This 
gives, for any c > and ||A|| < c, 

(16) G n (A) = 2A t CZ„-AV n A + r n (A), 
where 

k„(A)| < (2/3)J2\^ t Xn, l f\\h(X t X n , l )\ 
1=1 

< (4/3)||A||D n A t KA < (4/3) C 3 J D n maxeig(K), 

provided cD n < With T HiC and T* c denoting the maxima of G n and G* 
over the ball O n (c) = {A : || A || < ca n }, we have 

Ja n \ < (l/a„)max{|r n (A)| : ||A|| < ca n } 
< (4/3)c 3 a n D n maxeig(a n y n ), 

as long as ca n D n < i. Choose c big enough to have both A and A* inside 
fi n (c) with probability above 1 — ij, for some preassigned r\. Then 

P{\T n /a n -T*/a n \>e} 

< P{(A/3)c 2 a n D n maxeig(a n V n ) > e} 

+ P{||A|| >ca n } + P{\\\*\\ >ca n } + P{ca n D n >\}. 

Hence the lim-sup of the probability sequence on the left is bounded by 2rj. 
Since r\ was arbitrary, T n /a n and T*/a n must have the same limit distribu- 
tion, namely U t V^ 1 U. □ 
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Proof of the claim of Remark 2.2. Conditions (A4) and (A5) with 
a n = 1 imply that, given any real sequence S n I 0, 

sup 

\\6-6 \\<S n ,heH 

The consistency of 9 then implies 

n 

R n = ^{mf{Xi,e,h) ~ m® 2 (Xi,e ,h)} -+ pr 0. 

i=l 

Thus 

\V 2 ~V 2 \ < \R n \ + 

where we have used assumption (A2) for the last term, so V 2 consistently 
estimates V 2 . □ 

Proof of Theorem 2.2. By (2), the singular value theorem applied 
to V 2 ^ and V^" 1 , along with the Cramer- Wold theorem, it suffices to show 
that V 2 -^ pr V 2 and that 

P*{V^[M*(e,h*) - M n (9,h)} <t}- P{U <t} = o pr (l). 

The former follows from Remark 2.2, under conditions (A4) and (A5). For 
the latter, define, for any sequences a^,a^ [ 0, 

-4n,a„ = j|#-#o| < al, sup \B n (t)\ < a^, sup \\C n (9,h)\\ < a^rT 11 

\\h — ho\\-u < a^n -1 / 4 

where B n (t) respectively C n (6,h) is the expression between absolute values 
(norm-signs) in condition (Bl) respectively (B2), and where the supremum 
for C n is taken over \\9 — 6$\\ < a\, \\h — ^ollw — a \- Then, by conditions 
(Bl), (B2), (B4) and the consistency of 6, a\ and a 2 can be chosen such 
that P(A 

n,a n ) — > 1 as n tends to infinity. Hence it suffices to establish the 
convergence in probability, conditionally on the event A n ^ an . It now follows 
from condition (B5) that 

\\M* n {9X) - M*(9,h) -T(9,h)[h* -h)\\ 

= \\M n (9,h*)-M n (9,h)-T(9,h)[h* -h]\\ +0 p*(n- l l 2 ) 
< c\\h* -h\\n + o P *(n" 1/2 ) = o P »(n- 1/2 ) a.s. 



y2{mf(X l ,9,h)-mf(X l ,9 ,h)} 



pr 



0. 



mf(9 ,h)-V 2 



i=l 
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In a similar way it follows from (B2), (B3) and (B4) that 

\\M n (9 ,h) - M n (9 , h ) - T(0 , h )[h - h ]\\ = o pr (n- 1 ' 2 ). 
Hence condition (Bl) implies that 
V^{M*(6,h*) - M n (9,h)} 

= Vn{M*(9,h) - M n (9,h) + F(0,h)[h* - h]} + o P *(l) a.s. 
has the same limiting distribution as 

V^{M n (6o,h ) +T(9 ,h )[h- h }} = ^M n (e Q ,h) + o pr (l), 
which by condition (Al) converges to U. □ 

We note that Theorem 4.1 is an immediate consequence of Propositions 
4.1 and 4.2. We now tend to proving these. 

Proof of Proposition 4.1. That the conditions of Theorem 4.1 se- 
cure conditions (D0)-(D4) follows from Lemmas 4.1-4.4, proven below. Here 
we show that these conditions imply (T n — T*)/p 1 / 2 -^ pr 0. 

Using (12) and (16) we see that G* n is the natural two-step Taylor expan- 
sion approximation of G n , and that G n = 67* + r n with 

n 

r n {\) = (2/3)^(\ t Xn t i/y/n) 3 h(\ t Xi/y/n) < (4/3)||A||(A l /v / ^)A t 5„A 
i=i 

as long as \\XD n / y/n\\ < ^. Choose c such that the set 0, n (c) = {A: ||A|| < 
cp 1 / 2 } catches both A and A*, with probability at least 1 — r/ for all large n, 
where rj is any preassigned positive number. Then 

MA)| < (4/3)cV /2 ™~ 1/2 £>nmaxeig(S n ) for all A G Q n (c), 

with arguments similar to those used for proving Theorem 2.1. This implies 

P{\T n -T*\/p^>e} 

< P{(4/3)c 3 pn" 1/2 Z) n maxeig(5 n ) > e} 

+ P{cp^ 2 n^ 2 D n > |} + P{\ i Q n (c)} + P{\* i n n (c)}. 

Accordingly, under (D1)-(D4), the lim-sup of the left-hand side sequence is 
bounded by 2r], and is hence zero. The modified and stronger result T n — 
T* ^ pr follows similarly under the stronger assumption. □ 

Proof of Proposition 4.2. That the conditions of Theorem 4.1 guar- 
antee conditions (D5)-(D6) is a consequence of Lemma 4.4, proven below. 
Here we show that these imply (T* - T^/p 1 / 2 -> pr 0. 
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To this end, write S n = S n + e n , so that S n 1 = S n 1 — S n 1 e n S n 1 when 
the elements of S~ 1 e n become uniformly small, which they do in view of 
(D5)-(D6). Hence 

where W n = S n 1 ^ 2 y/nX n is seen to have ||W n || = O w (jp 1 / 2 ) and 

E n = S n 1 ^ 2 e n S n 1 ^ 2 must have the property that l^Enu] = O pr (pL n ) for 
each unit vector u. This proves the first claim. The second claim of the propo- 

— 1/2 

sition follows, after a transformation to new variables X' n i = T, n m n (Zi,9 n ) 
with mean zero and variance matrix the identity matrix I p , from efforts of 
Portnoy (1988), who used a martingale central limit theorem. □ 

Proof of Lemma 4.1. When \X nji j\ < M for all components, then 
D n < Mp 1 / 2 , proving part (a). For the general case, to gauge the size of D n 
we cannot appeal to arguments involving the Borel-Cantelli lemma, as Owen 
(2001), Chapter 11, could when analyzing the fixed p situation. However, 
P{{p/ yfn)D n > e} is bounded by 

n 3q/2 

£P{||X n)i || > eV^/p} < n4 7 ^-maxE||X n . i /p 1 / 2 || ( ', 

r~ i n q i e q i<n 

i=i ~ 

which is seen to imply (b) of the lemma. □ 



n ■ 



Proof of Lemma 4.2. Observe that \H n (u) - H n (v)\ < \\u - v\\D 
The full surface of the p-dimensional unit ball may be covered by the union 
of a finite number C Pj7l of rectangles with side length S n , provided C Pin 5fJ _1 
is as big as A p = 2vrP/ 2 /r(p/2), the surface area of the unit ball. Hence 

maxil n (ii) < max H n (u) + 5 n D n = H* + 5 n D n , 

where IA P)TI is the finite set in question. To show (14) we demonstrate 

P{H* < -e} -» 1 and P{S n D n < e} 1. 

We need to choose 5 n so that the second requirement holds, and then check 
whether P{H^ > — e} < C Ptn r n is sufficient to meet the first requirement. 
What is demanded is that logC Pjn + nlogr — ► — oo, and this is seen to cor- 
respond to {plog(l /S n )}/n — > 0. 

(a) For the bounded components part we have D n < Mp 1 ' 2 as with 
Lemma 4.1, and may take 5 n = e/(Mp 1 ^ 2 ). In this case, therefore, the 
n _1 p logp— >0 condition suffices for (14) to hold, (b) For this situation we 
take 5 n =p/\/n, guaranteeing by Lemma 4.1 that P{5 n D n < e} — ► 1. Some 
analysis shows that (p/n) log(l/(5 n 

) = n 1 / 2 x n log(l/x n ), with x n =p/y/n, 

which tends to zero. □ 
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Proof of Lemma 4.3. Write A = ||A||u, where the random u has unit 
length. One may argue as in Owen (2001), Chapter 11.2, to reach 

||A||KS> - (D n /VH)V^«*X n } < yWX n . 

Here there is a positive 5 such that the event u S n u > 5 has probability 
tending to 1, while D n v£X n -^ pr 0. The result follows. □ 

Proof of Lemma 4.4. For the components of the p x p matrix e n = 
S n — T, n , a bounding operation gives 

P{\e njk \>e}<^p^<^^^ 

U nj,fc| _ / _ ~ n q/2 £ q ' 

for a constant c(q), by results of von Bahr (1965). Here v n j,fc = ^*(X n ,i,jX n j^) 2 ' 
(S nj) fc) 2 is the variance of X n ^jX n i^. This may be further bounded by 

vnj,k < (E|X n , lJ | 4 ) 1 /2(E|X riAfc | 4 ) 1 /2 < (E\X ntid \« ) 2 ^{nX n ^f^ 

for q > 4. This leads to 

EElX • -I^EIX • ; \ q 
C («) ^/2 £g 

which is then seen to imply the lemma. □ 



Proof of Lemma 4.5. We work with the explicit expression for A*, 

]/2 —1/2 I 

which leads to a representation in the form of S n W n , with W n = S n \/nX, 
Here ||W n || is precisely (T*) 1 / 2 , hence of size O pi (p 1 / 2 ), while ||S , n 1 ^ 2 u|| = 
O pr (l) for all unit vector u. This proves the lemma. □ 

PROOF of Proposition 4.3. The central point to note is that the em- 
pirical likelihood (8) is invariant with respect to the transformation that 
maps data Zi to A n Zi, where A n is any nonsingular nonrandom p x p 
matrix. If EL n {A n fi \ A n ) is the empirical likelihood computed on the ba- 
sis of Z[ = A n Zi, for the parameter Jl = A n fi, then A n cancels out of the 
defining equation Yli=i Wi(A n Zi — A n pL) = 0, showing that EL n (/3 | A n ) is 
the same as EL n (/j) in (8), that is, independent of A n (and with the same 
maximizing Wi's). The same is true for the quadratic approximation T n = 
n(Z n — S~ 1 (Z n — fj, n ) of (10). We may in particular employ A n = E n 1 ^ 2 , 
where the resulting A n Zi have variance matrix I p . The proof of the lemma 
now follows using arguments similar to those needed for Theorem 4.1 but 
under the additional simplifying assumptions that S n = I p . □ 
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